Creation of tree-based and customized industry-oriented knowledge base

ABSTRACT

A customized industry-oriented knowledge base (CIO KB) with information which is relevant to a user&#39;s interests includes information about different relevant natural/technical items or processes relating to given industry or discipline. This involves forming a customized industry-oriented knowledge base (CIO KB) on the basis of tree of the CIO KB comprising names of items, processes, parameters which relevant to given industry. The CIO KB is formed from an SAO KB (subject-action-object knowledge base) by selection of all the SAOs comprising the mentioned names of relevant items, processes, or parameters in their subjects or objects.

RELATED APPLICATIONS

[0001] This application is a continuation-in-part of U.S. patentapplications Ser. Nos. 60/199,658 filed Apr. 25, 2000 and 60/199,921filed Apr. 26, 2000, and is related to copending U.S. patent applicationSer. No. 09/541,192 filed Apr. 3, 2000, which is a continuationapplication of copending U.S. patent application Ser. No. 09/345,547,filed Jun. 30,1999 which is a continuation-in-part of copending U.S.patent application Ser. No. 09/321,804 filed May 27, 1999, and is alsorelated to the copending provisional application of Galina Troyanovaentitled Synonym Extension Of Search Queries With Validation being filedconcurrently herewith. These applications are herewith incorporatedherein by reference.

FIELD OF THE INVENTION

[0002] This invention relates to computer based knowledge bases, andparticularly to creation of specialized knowledge bases from variousnatural language texts.

BACKGROUND OF THE INVENTION

[0003] Computer based document search processors are known to performkey word searches for publications on the World Wide Web and othersources of information. Today a user can download 10,000 papers from theWeb by typing the word “Screen”. These can include computer screen, TVScreen, window screen, and other screens. Because of the enormous amountof information available on the Web, key word search processors producetoo much downloaded information, the vast majority of which isirrelevant or immaterial to the information the user wants.

[0004] Various attempts purport to increase the recall and precision ofthe selection such as U.S. Pat. Nos. 5,774,833 and 5,794,050incorporated here by reference, however, these methods simply rely onkey word or phrase searching. U.S. Pat. No. 6,167,370 discloses means tosemantically process candidate documents for specific technologicalfunctions and specific physical effects so that fewer prioritizedarticles meeting the search criteria are presented or identified to theuser. The application proposes Subject-Action-Object extractions withineach sentence and stores them.

[0005] A Subject-Action-Object Knowledge Base (SAO KB) contains thefields with subjects, actions, and objects and is prepared from naturallanguage texts with help of a semantic processor. These are described incopending U.S. patent application Ser. No. 09/541,192 filed Apr. 3,2000. However, the size of an SAO KB, when it exceeds 100 million SAOsmay make it cumbersome to obtain specialized information in a limitedfield.

[0006] An object of the invention is to improve search systems of thistype and to produce a customized industry-oriented knowledge base (CIOKB).

SUMMARY OF EMBODIMENTS OF THE INVENTION

[0007] An embodiment of the invention involves an industry-orientedknowledge base tree submitting a computer search query and extractingdocuments from a document source on the basis of the query; semanticallyprocessing language from extracted documents in a semantic processor toobtain subject-action-object groups (SAOs); selecting relevant resultsfrom the SAOs and entering the relevant results back into the knowledgebase tree; successively submitting new queries from the knowledge basetree so as to extract additional documents from the document source andsemantically processing SAOs from extracted documents and in a loopsuccessively reentering relevant results obtained from the SAOs backinto the knowledge base tree; and extracting information from theknowledge base tree and the SAOs to produce a customized industryoriented knowledge base (CIO KB).

[0008] These and other aspects, objects, and advantages of the inventionwill become evident from the following description of exemplaryembodiments when read in light of the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009]FIG. 1 is a block diagram of a computer system containing acomputer program embodying this invention.

[0010]FIG. 2, is a flow chart illustrating operation of the computerprogram in FIG. 1.

[0011]FIG. 3 is a flow chart showing further details of the computerprogram of FIG. 2.

[0012]FIGS. 4a, 4 b, and 4 c are examples of screens appearing in themonitor of the computer of FIG. 1 and data from the program of FIGS. 2and 3.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0013] The following are incorporated herein by reference:

[0014] I. The system and on-line information service presently availableat www.cobrain.com and the publicly available user manual therefor.

[0015] II. The software product presently marketed by Invention MachineCorporation of Boston, Mass., USA, under it's trademark “KNOWLEDGIST”and the publicly available user manual therefor.

[0016] III. U.S. Pat. No. 6,167,370.

[0017] IV. U.S. patent application Ser. No. 09/541,182 filed Apr. 3,2000.

[0018] V. The software product presently marketed by Invention MachineCorporation of Boston, Mass., USA under its Trademark “TECHOPTIMIZER”and the publicly available user manual therefor.

[0019] VI. U.S. Pat. No. 5,901,068.

[0020] In FIG. 1, a tool or program for creating a tree-based andindustry-oriented knowledge base embodying the invention resides in apersonal computer 12 and that includes a CPU 14, a monitor 16, akeyboard/mouse 18, and a printer 20. The program may be stored on aportable disk and inserted in a disk reader slot 22 or on a fixed discin the computer or on a ROM. According to an alternate embodiment theprogram resides on a server and the user accesses the program via thecommunication ports 23, a LAN (local area network), WAN (wide areanetwork), or the Internet. Computer 12 can be conventional and be of anysuitable make or brand. Other peripherals and modem/network interfacescan be provided as desired. For convenience the program utilizes thedisplays in the system and on-line information service presentlyavailable at www.cobrain.com.

[0021]FIG. 2 is a flow chart that illustrates a tool embodying theinvention. To start a user is invited to create or enter a knowledgebase tree. It may be entered in an ordinary word-processing program or adatabase program and imported into the program of FIG. 2. This knowledgebase is hereafter referred to the tree of the CIO KB.

[0022] According to an embodiment, the tree of the CIO KB is in the formof a single word, but according to another embodiment, is a multilevelhierarchical list of items and/or processes (technical, natural, orother) and/or its parameters with synonyms related to a given industryor discipline. According to an embodiment, pre-formulated industry treesare stored in a dictionary that enables a user to search for a selectedtree and enter a desired tree. In addition, the user can enter a manualmode and enter terms to generate a tree of the user's own interest.

[0023] The tree includes the names of the tree's branches andexpressions for a search, in object/subject form, of an SAO KB. If theSAO contains these expressions in their subject or object, this SAO isincluded into given tree's branch. A user can choose the classificationtype—for subjects, or for objects. The object classification follows:

[0024] A multilevel CIO KB tree has the following form: Synonymous ornear-synonymous expressions for last level of tree (used for searchIntermediate level Last in object/subject First level of tree of treelevel of tree in SAO KB) Microelectronics Lithography Resist ResistPhotoresist layer Wafer Wafer Substrate

[0025] The general scheme of the tool appears in FIG. 2. It includes thefollowing stages performed by the computer 12. These are:

[0026] 1. Preparing an initial list of queries 1010 from the names ofitems or processes, or their parameters extracted from a given branch orbranches of the tree of the CIO KB 1020. There are several ways toprepare list of queries. In a first embodiment the way is to formqueries from expressions of the last level of the tree connected by theBoolean Expression “OR”; for example:

[0027] [Resists] OR [Photoresist layer];

[0028] [Wafer] OR [Substrate].

[0029] According to another, more complicated but more accurate system,way is to form queries from expressions at the last level of the treejoined by “OR” and name of a higher level connected by an “AND”.

[0030] For example:

[0031] [Lithography] AND {[Resists] OR [Photoresist layer]};

[0032] [Lithography] AND {[Wafer] OR [Substrate]}.

[0033] If the tree of the CIO KB is initially empty, the user mayprepare an initial query.

[0034] 2. Searching for documents related to these queries in externalinformation sources at 1030 (WWW, Intranet, or other externaldocuments),

[0035] 3. A Semantic Processor at 1040 treats the found documents. Forthis purpose, it extracts all subject-action-object (SAO) relations fromthe documents at 1050 and extracts noun groups from the documents at1060 (according to U.S. patent application Ser. No. 09/541,182 filedApr. 3, 2000). Usually, noun groups represent the names ofitems/processes, or parameters.

[0036] 4. Automatic selection at 1070 of noun groups (items/processes,or parameters) relevant to a found document.

[0037] According to an embodiment the following algorithm is used tocalculate relevance of noun groups extracted from document.

[0038] A. Extract all significant words (nouns and adjectives) from noungroup by tags.

[0039] B. Calculate the estimating value (weight) of each significantword of noun group is calculated. To calculate the estimating value thealgorithm takes into account:

[0040] The word frequency in the document;

[0041] This word is either subject or object;

[0042] The word is take part in some semantic relation of SAO. In otherwords it is included in the main word in the noun group;

[0043] The word is part of the title.

[0044] C. Calculate the final estimating value of A noun group as thearithmetic mean of estimating values of all its constituent significantwords.

[0045] The higher obtained estimating value indicates the more relevantnoun group to the source document.

[0046] In addition to selection of relevant noun groups, filtration,according to an embodiment is accomplished with help of a stop- thatinclude too general expressions.

[0047] At unit 1080, the user can remove, edit and (or) classify noungroups.

[0048] 5. A list of selected items/processes, or parameters is added at1090 to the same branch of the tree of the CIO KB where initial list ofqueries was extracted. This renews and extends the tree 1020 of the CIOKB. The extended tree 1020 serves for producing the next generation ofqueries. According to an embodiment, this procedure is performed in aloop.

[0049] 6. SAOs extracted by the semantic processor 1040 from externaldocuments 1030 form a new SAO KB at 1100 or are merged into an existingSAO KB. The tree 1020 is used to create the CIO KB at 1110 from SAO KBat 1100.

[0050] At first, the search is performed of SAOs whose objects containthe expressions of last-level of the tree. Then, found SAOs, theiroriginal sentences and references are joined with given branch of tree.Hierarchically organized SAOs, their original sentences and referencesconstitute the CIO KB.

[0051] Extension of the tree 1020 causes extension of the created CIOKB.

[0052] Thus the user can prepare his/her own (customized) tree and theCIO KB. Moreover, the tool of this embodiment employs positivefeedback—since, extended tree generates extended queries, and asconsequence—more volume of relevant text information enters the CIO KBat 1110. This is called a “self-learning system”.

[0053] A more detailed embodiment of a tool appears in FIG. 3. Here aninput unit 110 receives initial tree data 120 from a user orautomatically. It is possible to begin from an initial tree having onlyone word or expression. Initial tree data can be represent in any textformat. Tree data 120 are transmitted into tree formation or renewalmodule 130, which forms the tree 140 of the CIO KB.

[0054] The content from the tree 140 (either all expressions at the lastlevels of the tree or only expressions that were selected by user) istransmitted into a queries formation module 150, which forms a query ora set of queries 160. In addition, content of the tree 140 passes into aCIO KB formation module 260 for formation of a CIO KB 300, which is madeavailable for display by the user by an output unit 310. The displayappears in FIG. 4.

[0055] Queries 160 pass into a search module 170. The search module 170uses the queries 160 to search documents from different externalinformation sources 180. The search module 170 downloads the foundrelevant documents and transmits them to a semantic processor 190.

[0056] The semantic processor 190 extracts noun groups 200 from thenatural language text documents. The semantic processor 190 alsoconverts natural language texts into Subject-Action-Object (SAO)relations. This SAO data 280 is stored in an SAO Knowledge Database (SAOKB) 290.

[0057] For example, semantic processor 190 can extract the followingnoun groups: “Thin photoresist layer” and “UV laser light” from thesentence: “Thin photoresist layer is heated by UV laser light” andconvert it into following fields in the SAO KB:

[0058] Subject—”UV laser light”;

[0059] Action—“heat”;

[0060] Object—”Thin photoresist layer”.

[0061] The initial list of noun groups 200 extracted by semanticprocessor 190 is transmitted into selection module 210. Selection module210 removes non-informative noun groups and performs the selection ofrelevant noun groups. Removal of non-informative noun groups isperformed with help of a stop-dictionary, that includes too generalexpressions, such as “method”, “device”, “advanced technology”, etc.

[0062] To select relevant noun groups, their estimation are performedaccordingly the following rules:

[0063] A. All significant words (nouns and adjectives) are extractedfrom noun group by tags.

[0064] B. Estimating value (weight) of each significant word of noungroup is calculated. The estimation algorithm takes into account:

[0065] word frequency in the document;

[0066] word position in subject or object;

[0067] presence of given word in title, etc.

[0068] C. Final estimation of the noun group is calculated as thearithmetic mean of estimating values of all its constituent significantwords.

[0069] The most relevant noun group to source document has the highestestimating value.

[0070] A list of selected noun groups 220 advances into an editingmodule 230 and the user can remove, edit, and/or classify the selectednoun groups in editing unit 240. A list of these edited noun groups 250passes into the tree formation or renewal module 130 and serves forexpansion of the tree 140.

[0071] The data in tree 140 of the CIO KB passes into a CIO KB formationmodule 260. This module forms the CIO KB 300 with help of the tree 140and SAO KB 290. The CIO KB includes the SAOs with objects containing theexpressions from the tree 140 of the CIO KB.

[0072] To form the CIO KB, a search is performed of SAOs whose objectscontain the expressions of last level of the tree. Then found SAOs,their original sentences and references join with the given branch oftree.

[0073] All the SAOs are grouped by folders according to tree branches.SAOs inside the every folder can be placed alphabetically or grouped bysubfolders with the help of an action dictionary 270.

[0074] Subfolders are formed on the basis of actions in the dictionary270. The latter contains six parts, namely a:

[0075] List of verbs divided in groups, containing the verbs withsimilar sense (heat-warm, produce-create-generate, etc.);

[0076] List of “verb-noun” expressions synonymous with other verbs(heat—increase temperature—rise temperature, etc.)

[0077] List of “verbsA” including the verbs—perform, carry out, realize,and other verbs with similar sense;

[0078] List of “noun” including the following groups—“verb—relevantverbal noun” (heat—heating; produce—production, etc.)

[0079] List of “verbsB” including the verbs—produce, create, form, andother verbs with similar sense;

[0080] List of “participle2” including the followinggroups—“verb—relevant participle2” (heat—heated; produce—produced,etc.).

[0081] The use of action dictionary 270 allows collection of SAOs withsimilar actions. For example, the program can collect SAOs with thefollowing AO: “heat—something, increase—temperature of something,perform—heating of something, and produce heated something” into singlesubfolder with name: “heat—something”.

[0082] The proposed tool may for example operate as follows:

[0083] At the beginning we have some data 120 for the tree 140 (it ispossible to begin from one word or expression): Synonymous or near-synonymous expressions for last level of tree (used for search inobject/subject in First level of tree Last level of tree SAO KB)Lithography Imaging system Imaging optics Imaging system Phase shifterPhase shifter Phase shifting mask Phase shift region Phase shiftermaterial Resist Photoresist Resist mask Layer of photoresist Layer ofresist Photoresist layer Resist film Resist

[0084] Tree formation or renewal module 130 forms the tree 140. Thistree 140 is the source for forming the query 160 with module 150. Thequery can have different configurations depending on the user' choice.

[0085] For example, it is possible to form the following queries fromabove-mentioned tree:

[0086] [Imaging system] OR [Optical imaging system] OR [Imaging optics];

[0087] [Phase shifter] OR [Phase shifting mask] OR [Phase shift region]OR [Phase shifter material];

[0088] [Resist] OR [Photoresist] OR [Resist mask] OR [Layer ofphotoresist]OR [Layer of resist] OR [Photoresist layer] OR [Resistfilm];

[0089] or

[0090] [Lithography] AND {[Imaging system] OR [Optical imaging system]OR [Imaging optics]}

[0091] [Lithography] AND {[Phase shifter] OR [Phase shifting mask] OR[Phase shift region] OR [Phase shifter material]}

[0092] [Lithography] AND {[Resist] OR [Photoresist] OR [Resist mask] OR[Layer of photoresist] OR [Layer of resist] OR [Photoresist layer] OR[Resist film]}.

[0093] The search module 170 performs a search of documents according tothe queries 160. The semantic processor 190 treats the found documents.This results in SAOs 280 that are transmitted into an SAO KB 290.Besides SAOs, the semantic processor 190 forms the list of noun groups200, which are absent from the initial queries. Selection module 210filters these nouns groups to remove non-informative data. According toan embodiment, filtration is accomplished with help of a stop-dictionaryand (or) selection of most relevant noun groups. Then the user canremove, edit, and classify these noun groups with help of editing module230. This produces the list of edited and classified noun groups 250which are added into initial tree of the CIO KB 300 by tree formation orrenewal module 130: Synonymous or near- synonymous expressions for lastlevel of tree (used for search in object/subject First level of treeLast level of tree in SAO KB) Lithography Ultraviolet radiationFar-ultra violet light UV laser light Ultraviolet radiation UV light UVradiation Wafer Wafer Substrate Wafer disk Opaque layer Opaque layerOpaque pattern layer Opaque metal layer Opaque surface layerAntireflection layer Antireflection layer Antireflection multilayer filmAntireflection film Surface of antireflection film

[0094] Thus, the initial tree (which contained three branches—Imagingsystem, Phase shifter, Resist) is converted into a more complicated treewith additional branches (Ultraviolet radiation, Wafer, Opaque layer,Antireflection layer).

[0095] The module 260 forms the CIO KB 300 from the SAO KB 290 with helpof the renewed tree 140 and actions dictionary 270. At first, the searchis performed of SAOs whose objects contain the expressions of the lastlevel of the tree. All the found SAOs, their original sentences andreferences are grouped by folders according to tree branches. Forexample, tree branch “Ultraviolet radiation” collects the followingSAOs, their original sentences and references:

[0096] Ultraviolet Radiation

[0097] convex lens—focus—ultraviolet radiation

[0098] The air filter includes a cabinet which houses an electrostaticair filter, an ultraviolet lamp and a parabolic reflector or a convexlens for focusing the ultraviolet radiation emitted by the lamp on anupstream side of the air filter.

[0099] \\Nilitis_srv\Patents\1998\November\US5837207

[0100] electron—molecule collision—generate—ultraviolet radiation

[0101] The electrons are maintained at this temperature for a sufficienttime to enable the free electrons to dissociate the waste material as aresult of collisions and ultraviolet radiation generated in situ byelectron-molecule collisions.

[0102] \\Nilitis_srv\Patents\1994\February\US5288969

[0103] micro-lens array plate—focus—UV light

[0104] Second, in a LCD utilizing phosphor elements as light source, amicro-lens array plate can be used to focus the UV light onto thephosphor elements for reduction of power consumption by the lamps.

[0105] \\Nilitis_srv\Patents\1999\February\US5871653

[0106] objective lens—condense—UV laser light

[0107] The UV laser light is then reflected by the mirror 14 andcondensed by an objective lens 6 so as to be radiated on an optical disc8.

[0108] \\Nilitis_srv\Patents\1998\October\US5822287

[0109] plasma—produce—intense ultraviolet radiation

[0110] An advantageous development is that the plasma that produces theintense ultraviolet radiation in the wavelength below 200 nm is excitedin the laser.

[0111] \\Nilitis_srv\Patents\1993\September\US5244428

[0112] surface or corona discharge—produce—ultraviolet radiation

[0113] A miniature solid state laser is optically pumped by ultravioletradiation produced by a surface or corona discharge.

[0114] \\Nilitis_srv\Patents\1999\June\US502387

[0115] Then SAOs inside the every folder are grouped by subfolders withhelp of the action dictionary 270:

[0116] Ultraviolet Radiation

[0117] Focus Ultraviolet Radiation

[0118] convex lens—focus—ultraviolet radiation

[0119] The air filter includes a cabinet which houses an electrostaticair filter, an ultraviolet lamp and a parabolic reflector or a convexlens for focusing the ultraviolet radiation emitted by the lamp on anupstream side of the air filter.

[0120] \\Nilitis_srv\Patents\1998\November\US5837207

[0121] micro-lens array plate—focus—UV light

[0122] Second, in a LCD utilizing phosphor elements as light source, amicro-lens array plate can be used to focus the UV light onto thephosphor elements for reduction of power consumption by the lamps.

[0123] \\Nilitis_srv\Patents\1999\February\US5871653

[0124] objective lens—condense—UV laser light

[0125] The UV laser light is then reflected by the mirror 14 andcondensed by an objective lens 6 so as to be radiated on an optical disc8.

[0126] \\Nilitis_srv\Patents\1998\October\US5822287

[0127] Produce Ultraviolet Radiation

[0128] electron-molecule collision—generate—ultraviolet radiation

[0129] The electrons are maintained at this temperature for a sufficienttime to enable the free electrons to dissociate the waste material as aresult of collisions and ultraviolet radiation generated in situ byelectron-molecule collisions.

[0130] \\Nilitis_srv\Patents\1994\February\US5288969

[0131] plasma—produce—intense ultraviolet radiation

[0132] An advantageous development is that the plasma that produces theintense ultraviolet radiation in the wavelength below 200 nm is excitedin the laser.

[0133] \\Nilitis_srv\Patents\1993\September\US5244428

[0134] surface or corona discharge—produce—ultraviolet radiation

[0135] A miniature solid state laser is optically pumped by ultravioletradiation produced by a surface or corona discharge.

[0136] \\Nilitis_srv\Patents\1991\June\US502387

[0137] An illustration obtained for CIO KB 300 appears in FIG. 4a.

[0138] According to an embodiment the CIO KB is used for storage andfast search of information concerning various technical problems. A usercan accomplish the search by browsing in tree or with help of “ExtendedFind” as shown on FIG. 4b. The information is present for the user in afew forms:

[0139] brief form—as SAO (for example, “moving of lightcondenser—harden—electrodeposited photoresist”)

[0140] more extended form—as original sentence (for example, “If thelight condensers are moved horizontally, the electrodepositedphotoresist on the whole surface of the board and in the holes can betotally hardened.”)

[0141] reference form—as reference (URL) on corresponding document (inour example—U.S. Pat. No. 5,258,808—see FIG. 4c.)

[0142] Thus, the user has possibility of both a fast review ofinformation (in SAO form and original sentence), and careful study of areference document.

[0143] It will be understood that various other display symbols,emblems, colors, and configurations can be used instead of thosedisclosed for the exemplary embodiments herein. Also, variousimprovements and modifications can be made to the herein disclosedexemplary embodiments without departing from the spirit and scope of thepresent invention. The system and method according to the inventiveprinciples herein are necessarily not dependent upon the preciseexemplary hardware or software architecture disclosed herein.

[0144] The term “stop-dictionary” is the common name for dictionaries,which remove from a list, or prohibit the display of words (orexpressions) that appear in these dictionaries.

[0145] A user may use the CIO KB for categorization of knowledge (inboth the form of SAO and noun groups), which is extracted from documentswith the help of the semantic processor. A user may employ the CIO KBfor categorization of documents because it contains references todocuments from which SAO and noun groups are extracted. A user candefine peculiarities of the categorization by forming an initial treeand editing the renewed tree.

[0146] A user can store the CIO KB as a repository for informationrelevant to the user's technology or interest and access the outsidesources such as the Internet only for updates.

1. A method of forming a customized industry-oriented knowledge base(CIO KB) in a computer, comprising: submitting a computer search queryconcerning an industry with a knowledge base tree, and with anextraction section extracting documents from a document source on thebasis of the query; semantically processing language from extracteddocuments in a semantic processor of the computer to obtainsubject-action-object groups (SAOs); selecting relevant results from theSAOs and entering the relevant results in the knowledge base tree;successively submitting queries from the knowledge base tree so as toextract additional documents from the document source and semanticallyprocess SAOs from extracted documents and in a loop successivelyreentering relevant results obtained from the SAOs back into theknowledge base tree; and extracting information from the knowledge basetree and the saos to produce a CIO KB.
 2. A method as in claim 1,wherein the relevant results are noun groups selected from the SAOs. 3.A method as in claim 1, further comprising adding a list of actions froman actions dictionary to the.
 4. A method as in claim 1, wherein thestep of submitting queries includes submitting a list of queries fromthe names of items or processes, or their parameters extracted from agiven branch or branches of the knowledge base tree.
 5. A method as inclaim 1, wherein the step of extracting the documents from an externalsource includes extracting the documents from the World Wide Web, orintranet.
 6. A method as in claim 1, wherein the step of semanticallyprocessing language from the extracted documents includes extractingsubject-action-object (SAO) relations and noun groups from thedocuments.
 7. A method as in claim 6, wherein the noun groups representthe names of items, processes, or parameters.
 8. A method as in claim 1,wherein selection of the relevant results includes selection bystatistics, or intersections of relevant results concerning a givenindustry or discipline.
 9. A method as in claim 8, wherein the relevantresults are edited.
 10. A method as in claim 6, wherein selection of thenoun groups include selection by statistics, or intersections of noungroups concerning a given industry or discipline.
 11. A method as inclaim 7, wherein the noun groups are edited manually.
 12. A method as inclaim 1, wherein a query is submitted from a branch of the knowledgebase tree and the relevant results is reentered into the same branch ofthe knowledge base tree.
 13. A method as in claim 1, wherein thesemantically processed data is formed into SAOs and merged into an SAOknowledge base (SAO KB).
 14. A method as in claim 12, wherein said SAOKB and said knowledge base tree form said CIO KB.
 15. A computer systemfor forming a customized industry-oriented knowledge base (CIO KB) in acomputer, comprising: a knowledge base tree an extraction section forsubmitting a computer search query concerning an industry from theknowledge base tree and extracting documents from a document source onthe basis of the query; a processing section for semantically processinglanguage from extracted documents to obtain subject-action-object groups(SAOs); a selection section for selecting relevant results from the SAOsand entering the relevant results back into the knowledge base tree; anda formation section for extracting information from the knowledge basetree and the SAOs to produce a CIO KB.
 16. A system as in claim 14,wherein the relevant results are noun groups selected from the SAOs. 17.A system as in claim 14, wherein the formation section includes anactions dictionary.
 18. A system as in claim 14, wherein the knowledgebase tree submits queries including from the names of items orprocesses, or their parameters extracted from a given branch or branchesof the knowledge base tree.
 19. A system as in claim 14, wherein theextracting section extracts the documents from an external sourceincluding the World Wide Web, or intranet.
 20. A system as in claim 14,wherein the processing section extracts subject-action-object (SAO)relations and noun groups from the documents.
 21. A system as in claim20, wherein the noun groups represent the names of items, processes, orparameters.
 22. A system as in claim 14, wherein the selection sectionselects by statistics, or intersections of relevant results concerning agiven industry or discipline.
 23. A system as in claim 21, wherein theselection section includes an editing unit.
 24. A system as in claim 19,wherein the selection section selects noun groups by statistics, orintersections of noun groups concerning a given industry or discipline.25. A system as in claim 20, wherein selection section includes a manualeditor.
 26. A system as in claim 14, wherein said tree has branches andquery is submitted from a branch of the knowledge base tree and therelevant results is reentered into the same branch of the knowledge basetree.
 27. A system as in claim 14, wherein the processing sectionincludes an SAO knowledge base (SAO KB) for storing the SAOs.
 28. Asystem as in claim 27, wherein said SAO KB and said knowledge base treeform said CIO KB with an action dictionary.